Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coalesce all memory for checks and reports into shared pointers #2117

Merged
merged 4 commits into from
Feb 15, 2019
Merged

Coalesce all memory for checks and reports into shared pointers #2117

merged 4 commits into from
Feb 15, 2019

Conversation

duderino
Copy link
Contributor

What this PR does / why we need it:

This tries to get all memory associated with mixer requests into shared pointers. In several places I continue to use raw pointers because I could see that they were only used in the scope of the calling function and I wanted to minimize the size of this already huge PR.

It's necessary for implementing policy check retry on transport error safely (we need to ensure that all the memory needed by a retry lambda is not freed before the lambda is invoked).

It also fixes a SIGSEGV and memory leak.

It has a nice side effect of eliminating a few deep copies of protobufs.

Memory Leak Details:

If the upstream client disconnects during an upstream check to
the policy server, memory is leaked. Example:

Leak of 375808 bytes in 367 objects allocated from:
	@ 1fef7cf google::protobuf::internal::ArenaImpl::NewBlock
	@ 1ff011e google::protobuf::internal::ArenaImpl::SerialArena::AllocateAlignedFallback
	@ 1ff09b6 google::protobuf::internal::ArenaImpl::SerialArena::AllocateAligned
	@ 1fefb78 google::protobuf::internal::ArenaImpl::AllocateAligned
	@ 61544f google::protobuf::Arena::CreateMaybeMessage
	@ 55c835 google::protobuf::MessageLite::CreateMaybeMessage
	@ 55b973 istio::mixer::v1::CheckRequest::mutable_attributes
	@ 55a0aa istio::mixerclient::MixerClientImpl::Check

SIGSEGV Details:

If too many requests are sent to the upstream policy server a
circuit breaker may be hit (by default this limits to 1024 concurrent
requests). The MixerClientImpl will free memory prematurely and SIGSEGV
with a stack trace like:

[2019-01-26 23:15:47.525][28][debug][pool] [external/envoy/source/common/http/http2/conn_pool.cc:79] max requests overflow
[2019-01-26 23:15:47.525][28][debug][router] [external/envoy/source/common/router/router.cc:532] [C0][S6223237052999230964] upstream reset
[2019-01-26 23:15:47.526][28][debug][http] [external/envoy/source/common/http/async_client_impl.cc:93] async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'upstream connect error or disconnect/reset before headers'
'x-envoy-overloaded', 'true'

[2019-01-26 23:15:47.526][28][debug][grpc] [src/envoy/utils/grpc_transport.cc:78] Check failed with code: 14, upstream connect error or disconnect/reset before headers
[2019-01-26 23:15:47.526][28][warning][filter] [src/istio/mixerclient/client_impl.cc:136] Mixer policy check transport error: UNAVAILABLE:upstream connect error or disconnect/reset before headers
[2019-01-26 23:15:47.526][28][debug][filter] [src/istio/mixerclient/client_impl.cc:142] Mixer policy check status: UNAVAILABLE:upstream connect error or disconnect/reset before headers
[2019-01-26 23:15:47.526][28][debug][filter] [src/envoy/http/mixer/filter.cc:152] Called Mixer::Filter : check complete UNAVAILABLE:upstream connect error or disconnect/reset before headers
[2019-01-26 23:15:47.526][28][debug][filter] [src/envoy/http/mixer/filter.cc:132] Called Mixer::Filter : encodeHeaders 3
[2019-01-26 23:15:47.526][28][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1234] [C2230][S9774838916162700467] encoding headers via codec (end_stream=false):
':status', '503'
'content-length', '69'
'content-type', 'text/plain'
'date', 'Sat, 26 Jan 2019 23:15:46 GMT'
'server', 'envoy'

[2019-01-26 23:15:47.526][28][debug][filter] [src/envoy/http/mixer/filter.cc:207] Called Mixer::Filter : onDestroy state: 3
[2019-01-26 23:15:47.526][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:125] Caught Segmentation fault, suspect faulting address 0x8
[2019-01-26 23:15:47.526][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:94] Backtrace thr<28> obj</usr/lib/x86_64-linux-gnu/libstdc++.so.6> (If unsymbolized, use tools/stack_decode.py):
[2019-01-26 23:15:47.526][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:117] thr<28> #0 0x7ff123114c04 (unknown)
[2019-01-26 23:15:47.526][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:104] thr<28> obj</home/jblatt/.cache/bazel/_bazel_jblatt/b6258e8f3eabbcfebfab41d31f03811a/sandbox/linux-sandbox/4/execroot/__main__/bazel-out/k8-fastbuild/bin/test/integration/mixer_fault_test.runfiles/__main__/test/integration/mixer_fault_test>
[2019-01-26 23:15:47.553][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #1 0x209e113 google::protobuf::internal::GeneratedMessageReflection::HasBit()
[2019-01-26 23:15:47.571][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #2 0x20923d6 google::protobuf::internal::GeneratedMessageReflection::HasField()
[2019-01-26 23:15:47.587][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #3 0x2037f03 google::protobuf::TextFormat::Printer::PrintField()
[2019-01-26 23:15:47.603][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #4 0x2036d68 google::protobuf::TextFormat::Printer::Print()
[2019-01-26 23:15:47.621][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #5 0x20367e1 google::protobuf::TextFormat::Printer::Print()
[2019-01-26 23:15:47.638][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #6 0x20332f1 google::protobuf::TextFormat::Printer::PrintToString()
[2019-01-26 23:15:47.656][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #7 0x2033144 _ZNK6google8protobuf7Message11DebugStringB5cxx11Ev
[2019-01-26 23:15:47.673][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #8 0x50af1b Envoy::Utils::GrpcTransport<>::GrpcTransport()
[2019-01-26 23:15:47.690][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #9 0x50aaff Envoy::Utils::GrpcTransport<>::GetFunc()::{lambda()#1}::operator()()
[2019-01-26 23:15:47.707][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #10 0x50a857 std::_Function_handler<>::_M_invoke()
[2019-01-26 23:15:47.724][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #11 0x582383 std::function<>::operator()()
[2019-01-26 23:15:47.758][28][critical][backtrace] [bazel-out/k8-fastbuild/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] thr<28> #12 0x580b9a istio::mixerclient::MixerClientImpl::Check()

Which issue this PR fixes

This is another breakout PR of #2107 which is in service of istio/istio#8224.

I'll follow it with two more PRs (one with more counters, another with retries on policy/quota check transport failures)

This will help find future issues related to istio/istio#8224

…nto shared pointers that live as long as a request's mixer filter instance.
@duderino
Copy link
Contributor Author

@JimmyCYJ, Piotr is giving this a review, but I think it'd be really good if you took a look at it too.

@JimmyCYJ
Copy link
Member

@JimmyCYJ, Piotr is giving this a review, but I think it'd be really good if you took a look at it too.

Sure, I will review. Thanks.

Copy link
Member

@JimmyCYJ JimmyCYJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overrall, with a comment on arena.
Could you also build proxy binary and run mixer client integration tests under istio/istio/mixer/test/client?

Copy link
Contributor

@lambdai lambdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fundamental use-after-free should be solved by introducing Context and SharedAttributes

I have the concerns about if the above two concepts are used correctly and performance.

@duderino
Copy link
Contributor Author

@JimmyCYJ I already ran this against the mixer client tests in istio/istio. This PR will have to be merged after istio/istio pulls in the new istio/proxy: istio/istio#11591

Copy link
Contributor

@PiotrSikora PiotrSikora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sans a few small nits.

@JimmyCYJ
Copy link
Member

@JimmyCYJ I already ran this against the mixer client tests in istio/istio. This PR will have to be merged after istio/istio pulls in the new istio/proxy: istio/istio#11591

Thanks.

/lgtm

@istio-testing istio-testing added needs-rebase Indicates a PR needs to be rebased before being merged and removed lgtm labels Feb 15, 2019
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Feb 15, 2019
@JimmyCYJ
Copy link
Member

/lgtm

@istio-testing
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: duderino, JimmyCYJ

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@duderino duderino merged commit db38d03 into istio:release-1.1 Feb 15, 2019
istio-testing pushed a commit that referenced this pull request May 1, 2019
* Forwarded attributes override statically configured Local Attributes (#2097)

* WIP

* add local and override tests

* revert attributes_builder

* white list forward attributes

* add tests with whitelist

* fix builder test for white listed attributes

* ignore istio.mixer in report (#2098)

Signed-off-by: Lizan Zhou <[email protected]>

* whitelist kSourceNamespace attribute (#2100)

* Update software in the build image used by CircleCI. (#2110)

Signed-off-by: Piotr Sikora <[email protected]>

* Add flag indicating current semantics of report batch (#2111)

* Add flag indicating current semantics of report batch

* Fix Unit Test

* Update Envoy SHA to latest with deterministic hash (master). (#2108)

* Update Envoy SHA to latest with deterministic hash (master).

Signed-off-by: Piotr Sikora <[email protected]>

* review: use lld linker for clang-asan and clang-tsan.

Signed-off-by: Piotr Sikora <[email protected]>

* review: export PATH.

Signed-off-by: Piotr Sikora <[email protected]>

* Update Envoy SHA to latest with deterministic hash (release-1.1). (#2109)

* Update Envoy SHA to latest with deterministic hash (release-1.1).

Signed-off-by: Piotr Sikora <[email protected]>

* review: use lld linker for clang-asan and clang-tsan.

Signed-off-by: Piotr Sikora <[email protected]>

* review: export PATH.

Signed-off-by: Piotr Sikora <[email protected]>

* remove unused bytestring include from sni_verifier for openssl (#2112)

* Added client/server load test framework to find mixer faults. (#2105)

This is a load generator client + origin server I created to test the Mixer filter under various fault conditions using Envoy's client and server stacks. This work falls under [istio/istio#8224](istio/istio#8224)

@PiotrSikora @jplevyak would love your feedback because it could be used for the wasm work and especially because this is the first >=C++11 code I've written

See test/integration/int_client_server_test.cc if you want to start with an example for context.

Another example that uses this framework to sandwich Envoy+Mixer filter between the load generator and multiple origin servers simulating Mixer servers can be found in [istio/istio#8224](istio/istio#8224)

* Warn user of using mTLS PERMISSIVE mode and suggest to upgrade to STRICT mode (#2114)

* Warn user of using mTLS PERMISSIVE mode and suggest to upgrade to STRICT mode.

Signed-off-by: Yangmin Zhu <[email protected]>

* fix format

* check in constructor

* Update to latest istio/api on release-1.1 branch (#2115)

* Update to latest istio/api on release-1.1 branch

* Update istio/api to latest release-1.1

* Added simple logging abstraction so mixer client logs can be relayed to envoy logs. (#2116)

* Added simple logging abstraction so mixer client logs can be relayed to envoy logs when running inside envoy, stderr when running standalone.

* Log threshold guards that prevent needless serialization of logging arguments are now embedded in the log macros.

* Format

* Added do/while guards around logging statements.

* Coalesce all memory for checks and reports into shared pointers (#2117)

* Coalesce all memory for policy check requests and telemetry reports into shared pointers that live as long as a request's mixer filter instance.

* A few small fixups for the code review.

* Address some minor nits from code review.

* Additional counters for mixer policy check (#2118)

* Coalesce all memory for policy check requests and telemetry reports into shared pointers that live as long as a request's mixer filter instance.

* A few small fixups for the code review.

* Added finer-grained counters to mixer policy check

* Add retries to policy checks on failed transport error (#2113)

* Add configurable retry to policy/quota checks that failed due to transport error.

* Added assertions on mixer filter stats to mixer fault test.

* Reformat

* Fix inaccurate comment.
`

* Fix asan warning (thanks @silentdai!) and exclude mixer_fault_test from
the asan and tsan sanitizers since it always times out.

* Fix bad prefix check

* Pull in latest istio/api from release-1.1 branch (#2120)

* Add Joshua into proxy OWNER (#2121)

* log authn permissive mode only when config is received (#2125)

* log authn permissive mode only when config is received

Signed-off-by: Yangmin Zhu <[email protected]>

* fix format

* fix build

* clang-6/gcc: compiler barking fix (#2123)

* compiler barking

Signed-off-by: Kuat Yessenov <[email protected]>

* piotrs fix

Signed-off-by: Kuat Yessenov <[email protected]>

* Add additional telemetry report counters (#2128)

* Added counters to track telemetry report result.

* reformat

* replace tabs with spaces

* Replace more tab with spaces.

* New api sha for proxy (#2130)

* API sha just changed, chanign it again for proxy (#2131)

* Remove myself from owners add utka instead (#2129)

* implement upstream secure bit (#2133)

Signed-off-by: Kuat Yessenov <[email protected]>

* Deflake macos MixerFaultTest by broadening assertion ranges. (#2126)

* Deflake macos MixerFaultTest by broadening assertion ranges.

Fix flake in macos tests that was introduced by #2113

* Cleanup a few readability issues and add an assertion.

* More redability changes.

* API sha for proxy (#2136)

* Revert "implement upstream secure bit (#2133)" (#2135)

This reverts commit d857bdd.

* Add the support of bypassing JWT authn for CORS requests (#2139)

* Add the support of bypassing JWT authn for CORS requests

* Bail out earlier for CORS preflight requests

* Use OPTIONS constant value from Envoy

* Remove changing to lowercase

* Add more checks for CORS preflight requests (#2140)

* Rc3. new API sha for proxy. (#2146)

* API sha for proxy

* API sha for proxy

* update envoy with latest build fixes (#2147)

* update envoy with latest build fixes

Signed-off-by: Lizan Zhou <[email protected]>

* update protobuf to match envoy

Signed-off-by: Lizan Zhou <[email protected]>

* timeSystem -> timeSource

Signed-off-by: Lizan Zhou <[email protected]>

* requesting to add myself as a reviewer/approver (#2148)

I have 39 commits in this repo.

* update envoy to pick up TLS logging for HTTP upstream (#2149)

Signed-off-by: Lizan Zhou <[email protected]>

* Building 1.1rc4 (#2150)

* fix build

Signed-off-by: Lizan Zhou <[email protected]>

* fix format

Signed-off-by: Lizan Zhou <[email protected]>

* fix status match

Signed-off-by: Lizan Zhou <[email protected]>

* Fixes environment-dependent failures in MixerFaultTest (#2156)

* Removed explicit log-level setting from tests, as it was interfering with cli '-l' option (#2155)

* Update_Dependencies (#2178)

* Update envoy sha and fix bulid break (#2179)

* update envoy sha

* fix build

* Remove bazel shutdown from make deb

* Ignore error code returned from bazel shutdown
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants